可扩展性和准确性在深度极端多标签学习中得到了很好的认可挑战,其中目标是培训架构,以便自动注释具有来自极大的标签集的最相关标签子集的数据点。本文通过将深度极端多标签任务分解为四个更简单的子任务,开发了解决这些挑战的DeepXML框架,每个挑战可以准确且有效地培训。为四个子任务选择不同的组件允许DeepXML生成一个算法系列,在准确性和可扩展性之间产生不同的权衡。特别是,DeepXML产生了ASTEC算法,可以比公开可用的短文本数据集上的领先深度极端分类器更准确,5-30倍更快地进行5-30倍。 ASTEC还可以有效地在Bing短文本数据集上培训,该数据集包含多达6200万个标签,同时在商品硬件上进行数十亿用户和数据点的预测。这允许ASTEC部署在Bing搜索引擎上,以获取许多短文本应用程序,范围从匹配用户查询到广告商出价短语,以显示个性化广告,其中它在点击率,覆盖范围,收入和其他在线指标中产生了显着的收益目前在生产中的最先进技术。 Deepxml的代码可在https://github.com/extreme-classification/deepxml上获得
translated by 谷歌翻译
To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io
translated by 谷歌翻译
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks -- SuS and TIP-X, that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve state-of-the-art results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.
translated by 谷歌翻译
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
translated by 谷歌翻译
Dynamic movement primitives are widely used for learning skills which can be demonstrated to a robot by a skilled human or controller. While their generalization capabilities and simple formulation make them very appealing to use, they possess no strong guarantees to satisfy operational safety constraints for a task. In this paper, we present constrained dynamic movement primitives (CDMP) which can allow for constraint satisfaction in the robot workspace. We present a formulation of a non-linear optimization to perturb the DMP forcing weights regressed by locally-weighted regression to admit a Zeroing Barrier Function (ZBF), which certifies workspace constraint satisfaction. We demonstrate the proposed CDMP under different constraints on the end-effector movement such as obstacle avoidance and workspace constraints on a physical robot. A video showing the implementation of the proposed algorithm using different manipulators in different environments could be found here https://youtu.be/hJegJJkJfys.
translated by 谷歌翻译
先前的工作表明,单词在语音维度上是超级定义的,这些语音将它们与最小对竞争者区分开来。该现象已称为对比度超颗粒(CH)。我们提出了语音发作时间(fot)计划的动态神经场(DNF)模型,该模型从最小对竞争者的抑制作用中得出了CH。我们通过一项新的实验来测试模型的一些预测,该实验研究了伪金中无声的停止辅音CH。结果证明了伪造中的CH效应,这与实时计划和语音生产的效果的基础一致。与CH相比,用真实的词降低了伪金中CH的范围和大小,这与词汇和语音计划之间的互动激活的作用一致。我们讨论了模型统一一组明显不同现象的潜力,从CH到语音邻域效应到语音误差中的语音痕量效应。
translated by 谷歌翻译
最近的研究提出了一系列针对深度任务模型的专业优化算法。通常声称这些多任务优化(MTO)方法产生的解决方案优于仅通过优化任务损失的加权平均值而获得的解决方案。在本文中,我们对各种语言和视觉任务进行大规模实验,以检查这些主张的经验有效性。我们表明,尽管这些算法的设计和计算复杂性增加了,但MTO方法并未产生超出传统优化方法可实现的性能的任何改进。我们强调了替代策略,这些策略始终如一地提高性能概况,并指出可能导致次优效果的常见训练陷阱。最后,我们概述了可靠地评估MTO算法的性能并讨论潜在解决方案的挑战。
translated by 谷歌翻译
变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
传统的基于物理的建模是用于复杂非线性系统(如自动水下车辆(AUV))的控制设计中的耗时瓶颈。相比之下,纯粹的数据驱动模型虽然方便且迅速地获得,但需要大量的观察结果,并且缺乏针对安全至关重要系统的操作保证。利用可用的部分表征动态的数据驱动模型具有在典型的数据限制方案中为高价值复杂系统提供可靠的系统模型,从而避免了数月的数月昂贵的专家建模时间。在这项工作中,我们探索了专家模型和纯数据驱动建模之间的中间场。我们提出了面向控制的参数模型,具有不同水平的域意识,这些模型利用已知的系统结构和先前的物理知识来创建约束的深神经动力学系统模型。我们采用通用微分方程来构建AUV动力学的数据驱动的黑框和灰色框表示。此外,我们探索了一种混合制剂,该制剂明确模拟与不完美的灰色盒模型相关的残余误差。我们将学习模型的预测性能比较了初始条件和控制输入的不同分布的预测性能,以评估其准确性,概括和对控制的适用性。
translated by 谷歌翻译
大气效应(例如湍流和背景热噪声)抑制了在开关键控自由空间光学通信中使用的相干光的传播。在这里,我们介绍并实验验证了卷积神经网络,以降低后处理中自由空间光学通信的位错误率,而自由空间光学通信的位比基于高级光学器件的现有解决方案明显简单,更便宜。我们的方法由两个神经网络组成,这是第一个确定在热噪声和湍流中存在相干位序列以及第二个解调相干位序列的存在。通过生成连贯的光线,将它们与热灯结合在一起,并通过湍流的水箱将其结合起来,通过生成开关的键入键流,可以通过实验获得我们网络的所有数据,从而获得了模拟的湍流,并将其传递给了最终的光线。高度准确性。我们的卷积神经网络提高了与阈值分类方案相比的检测准确性,并具有与当前解调和误差校正方案集成的能力。
translated by 谷歌翻译